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Abstract 

Simulating believable facial animation is a topic of 
increasing interest in computer graphics and visual 
effects. In this paper we present a hybrid technique for 
the generation of facial animation based on motion 
capture data. After capturing a range of facial expressions 
defined by Facial Action Coding System (FACS), the 
radial basis function (RBF) is used to transfer the motion 
data onto two facial models, one realistic and one stylized. 
The calculations of the distances for the RBF technique 
are approached in three variants: Euclidean-based, 
geodesic mesh-based and hybrid-based. The last one 
takes the advantages of the first two approaches. In order 
to raise the efficiency, the calculations are aided by pre- 
processed distance data. The results are then evaluated in 
a quantitative and qualitative manner, comparing the 
animation outcomes with the real footage. Our findings 
show the efficiency of the hybrid technique when 
generating facial animation with motion capture. 

Keywords: Facial Animation, Motion Capture, Radial 
Basis Function, RBF, Euclidean distance, Geodesic 
distance, Hybrid integration, Facial Expression, Facial 
Action Coding System, FACS. 

Nomenclature 

RBF Radial Basis Function 

FACS Facial Action Coding System 

1. Introduction 

Facial modelling and animation for digital characters has 
become one of the most demanding topics in computer 
graphics, particularly within the context of feature films, 
video games and new emerging media such as virtual and 
augmented reality. Increasing research efforts aim at the 
new challenges to produce high-fidelity facial animation 
quality [1]. 

Due to the sensitivity of the human eye towards the 
features of the face, facial animation usually becomes the 
most observed feature in a character and the most prone 
to be subject of critique [2]. Given the high amount of 
subtleties that can be found in a facial expression such as 


wrinkles, micro expressions and complex anatomical 
structures underneath, the face is one of the most 
difficult parts of the human body to simulate properly, 
remaining still a very challenging task within computer 
graphics [3]. The face is the main element for 
communication and for the portrayal of emotion, and it is 
therefore key to recreate it in a believable manner when 
working within a digital framework. In relation to this, 
one of the main references for the analysis of facial 
motion is the FACS, a thorough study for the 
clasifflcation of facial movements and the synthesis of 
facial expressions [4] which serves nowadays as one of 
the main references for animators to study the 
complexity of the human face. 

Particularly for the case of facial animation, motion 
capture has been gaining popularity in the recent years. 
The collection of motion data varies depending on the 
requirements, and several techniques exist today to map 
the data onto a 3D model. One of the most common is 
the use of RBF, a robust surface deformation technique 
for the interpolation of surface scattered data [5]. 
Conventionally, RBF techniques use the Euclidean 
distance norm for their calculations, but other norms are 
found particularly useful for the case of facial animation, 
such as the geodesic distance norm. While the former is 
computationally faster, it fails in the proper deformation 
of areas with holes, such as the case of the mouth and the 
eyes. For these cases, geodesic calculations approximate 
better the results, at a cost of a more complex calculation 
algorithm [6]. 

In this paper we test the two approaches and propose the 
prototype of a hybrid algorithm that tries to combine the 
advantages of the two methods, preserving the quality of 
the animation and fastening the computations. 

Figure 1 summarizes the steps of this process. First, we 
gather a collection of facial expressions using motion 
capture based on the theory of FACS. Second, the RBF 
algorithm is implemented to calculate the surface 
deformation and synthesis of expression in realistic and 
stylized 3D facial models, using our motion capture data. 
Three approaches of the RBF interpolation are proposed, 
depending on the norm of the distance: Euclidean-based, 
geodesic mesh-based and hybrid-based. Finally, the 
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outcomes of each approach are evaluated with 
quantitative and qualitative methods. 

The paper is organized as follows: Section 2 covers an 
overview of some of the most relevant work in the area 
of facial animation for this research. Section 3 explains 
the main process for the collection of motion capture data, 
including an overview of the framework used for the data 
transfer. Section 4 focuses on each of the three RBF 
approaches proposed in this project, starting with 
Euclidean, continuing with geodesic distance and 
finishing with the proposal of our hybrid approach. 
Section 5 includes the results of our quantitative and 
qualitative evaluations, analyzing the advantages and 
disadvantages of each of the approaches. Finally, Section 
6 concludes this paper with some remarks and 
suggestions for future work. 



synthesis of refined animation directly from motion 
capture data, or Zhang et. al. [13] for motion capture 
retargeting with topology preservation. 

RBF techniques are widely used together with motion 
capture technology, becoming a very important tool for 
animation given its robustness and simplicity. Proposed 
first by Hardy [14], these techniques approximate the 
deformation of a surface given a set of control points, 
which define the deformation based on a particular 
distance function. Some of the most recent research 
efforts with RBF includes the work of Man-dun et. al. 
[15], proposing a simple RBF technique with Euclidean- 
based distance calculation together with the capture of 
face texture for the creation of realistic animation, or the 
work of Wan et. al. [6], who proposes an efficient RBF 
technique approximating the geodesic distance with 
discrete calculations. 

The use of FACS for the development and evaluation of 
facial animation has become more common in recent 
years, such as the case of Cosker et. al. [16], who 
presents a method for the creation and evaluation of a 
scanned 3D facial model using FACS, or the work of 
Villagrasa and Sanchez [17], who implement an 
animation system based on the theory of FACS. 

In this paper, we use motion capture technology and 
implement a hybrid-based distance approach of the RBF 
technique for the synthesis of facial expression, 
considering the theory of FACS for the acquisition of 
motion data and for the qualitative evaluation. 



2. Related Work 

Facial animation has been developed since the work of 
Parke [7] with the proposal of the first computer 
animation face. Nowadays, the current techniques for the 
recreation of facial animation have highly evolved and 
reached levels of high fidelity, with many different 
approaches developed, each of them adapting to 
particular requirements [2]. Remarkable examples of this 
can be found in the work of Alexander et. al. [8], with the 
photorealistic recreation of the animated face of an actor, 
Ichim et. al. [9] on the use of physics-based simulation 
for the realistic animation of facial muscles and flesh, or 
Cong et. al. [10], in which a muscle system was built to 
help in the sculpting of the facial expressions. In addition 
to this, facial animation has also been applied in medicine, 
such as the case of Sifakis et. al. [11] on the simulation of 
facial tissue and biomechanics on the face. 

Motion capture technologies have recently gained more 
importance for the generation of facial animation. Taking 
the motion directly from the performance of real actors, 
the data is then transferred onto a digital model for the 
synthesis of facial expressions. Recent research in this 
area includes the work of Ruhland et. al. [12] for the 


3. Motion Capture 

The configuration of our motion capture system is shown 
in Figure 2. The set up consisted of 12 infrared cameras 
arranged around a circle of 5 meters of diameter, with the 
performer in the centre. In addition to this, an extra 
regular camera was set in front of the performer for 
recording purposes. 

The performer was set with a total of 41 reflective 
markers, following the guidelines of the MPEG-4 
standard for the description of facial motion in space [18]. 
Figure 3 shows the arrangement of the markers on the 
face. 

During the session, the performer acted a selection of 24 
FACS units, divided in 6 separate groups: eyebrows, 
eyes, nose, lips, cheeks and jaw. Table 1 shows the 
information of each FACS unit. Two repetitions were 
performed for each instance, taking the best of them for 
each case. After the session, the motion capture data was 
manually post-processed for data labelling, gap filling 
and clean-up. 



Figure 2. Motion capture set up. 
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Figure 3: Markers layout (left) and arrangement on the performer’s 
face (right). 


Eyebrows 

AU01 (inner brow raiser), AU02 (outer 
brow raiser), AU04 (brow lowerer) 

Eyes 

AU43 (lip drop), AU44 (squint), AU46 
(wink) 

Nose 

AU10 (upper lip raiser), AU11 
(nasolabial deep), AU39 (nostril 
compressor) 

Lips 

AU12 (lip corner puller), AU13 (sharp 
lip puller), AU14 (dimpler), AU15 (lip 
corner depressor), AU16 (lower lip 
depressor), AU17 (chin raiser), AU18 
(lip pucker) 

Cheeks 

AU06 (cheek raiser), AU33 (cheek 
blow), AU34 (cheekpuff), AU35 (cheek 
suck) 

Jaw 

AU26 (jaw drop), AU27 (mouth stretch), 
AU29.1 (jaw thrust front), AU29.2 (jaw 
thrust back), AU30.1 (jaw to the right), 
AU30.2 (jaw to the left) 


Table 1: Selection of FACS units for our experiments. 


4. Facial Animation 

Two generic humanoid 3D models were used to transfer 
the motion capture data, one realistic and another stylized. 
The topology of both models was cut to the limits of the 
facial features, hiding the ears and the back of the head, 
resulting in a total of 2,746 vertices for the realistic 
model and a total of 3,774 vertices for the stylized one. 
Figure 4 shows each of these models with the motion 
capture markers attached to them. 

After matching the markers with the motion capture data, 
the RBF interpolation algorithm was implemented and 
then executed with our three variants proposed to apply 
the surface deformation in the facial models. 



Figure 4: Realistic (left) and stylized (right) facial model with the 
markers placed on top. 


4.1. RBF Interpolation Algorithm 

Using the mesh of the facial models as a surface and the 
motion capture markers as a set of control points over it, 
the RBF function was defined to satisfy the displacement 
Dj of the control points: 

D j =f(m j ),(0<M-l)) (1) 

f(mj) =£Wj <p(\\mj- m k \ \), (0 <j, k < M-l) (2) 

where M is the total number of control points, nij is the 
coordinates of control point j and <p(\ \mj - mk\\) is the 
Gaussian RBF function described by Equation (3) below. 
||rrij - rrik\\ is the norm of the distance between the two 
points, which we calculate with three different 
approaches discussed in Section 5. The weights Wjwqvq 
computed with the following Equation (4) with y = 2 in 
Equation (3): 

(p (\\mj-m k \\) = £ ■ II "?/- mk \Vy (3) 

Wj = Dj/£ <p(\\mj - m k \\), (0 <j,k< M-l) (4) 

After obtaining the weights Wj with the M control points 
through Equation (4), the displacement Dvt of the i th 
vertex on a facial model with total N vertices is 
determined by Equation (5) below: 

£>v,- = X Wj (p(\\nti-mj\\), 

(0<j (0 < i <N-1) 

4.2. Distance Matrices 

To avoid redundant calculations for each of the iterations 
of the RBF interpolation, the distances between each 
vertex and each marker were pre-calculated prior running 
the RBF algorithm and stored in a distance matrix. The 
use of this allowed for more flexibility and significantly 
fastened the computations. Three distance matrices were 
used: Euclidean matrix, Geodesic mesh-based matrix, 
and Hybrid-based matrix. 

5. Distance Calculation 

At the core of the RBF interpolation is the distance 
calculation algorithm. Three different approaches were 
tested in our experiments: Euclidean-based, Geodesic 
mesh-based and a hybrid-based method integrating the 
previous two. 

5.1. Euclidean-Based Distance 

The Euclidean norm calculates the distance between two 
points on a straight line with the equation below: 

\\p-q \I Euc = tJ((Px -qx) 2 +(p y -q y ) 2 +(pz -qz) 2 )) ( 6 ) 

While being one of the simplest and fastest method to 
determine the RBF function, it creates artifacts in areas 
of the face that present holes, such as the mouth and the 
eyes as discussed in Section 6 below. 
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5.2. Geodesic Mesh-Based Distance 

Geodesic distance provides a more accurate measure of 
the distance between two points along a surface. There 
are several methods to find the geodesic path between 
two points on a surface. For the scope of our experiments, 
we developed a simple approximation to calculate the 
geodesic distance, based on the topology of the mesh. 

In graph theory, the geodesic distance is the shortest path 
between two points along the edges of the graph, known 
as mesh-based distance. In this approach, we used one of 
the most common algorithm for the calculation of the 
shortest path based on the A-Star heuristic of Dijkstra’s 
search algorithm [19, 20]. 

For each | \p - q \| calculation, the shortest path between p 
and q is found using the A-Star heuristic. Being V the list 
of vertices that form the shortest-path, the geodesic 
mesh-based distance is the sum of the Euclidean 
distances between each pair of consecutive vertices 
which can be calculated with the following equation: 

\\p-q\\Geo = ^\\V[a]-V[a+l]\\ Euc 

(0 <a< length(V)-l) 

Despite being costlier, this method approximates better 
the facial deformation, and solves the artifacts appearing 
on the Euclidean-based approach, as stated previously by 
Wan et. al. [6]. 

5.3. Hybrid-Based Distance 

Given that the calculation of the geodesic mesh-based 
distance is costlier, in this approach we limit them to the 
critical areas of the face, using Euclidean distance 
calculations for the other areas. Figure 5 shows the 
partitions of the realistic and stylized model in different 
regions where the geodesic mesh-based distances are 
calculated in the regions highlighted in yellow and 
Euclidean distances are determined in other regions. 



Figure 5: realistic (left) and stylized (right) facial models with the 
selected regions for geodesic mesh-based distance calculations and 
the unselected regions for Euclidean distance computations. 


To avoid artifacts at the borders from the regions where 
geodesic mesh-based distances are calculated to those 
where Euclidean distances are determined, hybrid 
distances 11 p - q\\H y b formulated in Eq. (8) are introduced 
to create a smooth transition at the borders. A weight 
value w is used to interpolate between the geodesic mesh- 
based distance and the Euclidean distance, assigned in 
the [0.0, 1.0] range depending on the proximity to a 
geodesic region. The decay function used for the weight 
is based on a Gaussian curve with y = 2. 


\\p-q\\Hyb = w\\p-q\\Geo+ (l-w) \\p-q\\ E uc ( 8 ) 

Values of w equal to 0.6 or less are marked as 
insignificant for geodesic calculations. For these cases, 
the algorithm bypasses the geodesic mesh-based 
calculations, fastening the computations of the hybrid 
distance. 

Next section shows the results and the evaluation of each 
of these approaches in detail. 



Table 2: Extract of some of the animation results after applying the 
three RBF interpolation approaches for the realistic and the stylized 
model. The major differences can be noticed between the Euclidean 
and the Geodesic approach, particularly in the areas of the mouth and 
the eyes. 
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6. Results 

To compare the three different approaches for our RBF 
interpolation, each of the distance techniques was used to 
transfer the FACS-based facial movements captured onto 
the realistic and the stylized model described in Section 3. 
Table 2 shows an extract of our animation outcomes, 
showing the differences in surface deformation for each 
of the approaches. 

6.1. Performance Comparison 

For performance analysis, the RBF interpolation 
algorithms were tested on a 2.80 GHz Intel Core i7- 
7700HQ CPU with 16 GB RAM and a Nvidia GeForce 
GTX 1050Ti graphics card, executed in Autodesk Maya 
2017 under a Windows 10 system. 

Figures 6 and 7 show the time cost for the calculation of 
the RBF interpolation matrix in each approach. After 
these computations, the average calculation cost per 
vertex was 0.1 ms. 

The Euclidean-based approach is the fastest one, given its 
simplicity. Among the other two, the hybrid-based 
approach reduces significantly the computation time 
compare to the geodesic mesh-based approach. For each 
case, the use of distance matrices speeds up the 
calculations significantly. 

462.51 s 




288.82 S 


9.38 s 

Euclidean Geodesics Hybrid 

Figure 6: Computational times without distance matrices. 


5.05 s 



Euclidean Geodesics Hybrid 


Figure 7: Computational times with distance matrices. 

6.2. Quality Comparison 

The animation outcomes were compared with the real 
footage, and a qualitative analysis was carried out 
observing the six groups in which we divided the face: 
eyebrows, eyes, nose, lips, cheeks and jaw. 

The results show that the Euclidean-based approach is 
unsuitable for synthesis of facial expressions given the 
artifacts that occur in areas with holes. To avoid this, 
authors like Man-dun et. al. [15] propose the division of 
the face into separate regions, connecting the borders by 
interpolation. 


The outcomes of the geodesic mesh-based approach 
show a better synthesis of facial expressions compared to 
the previous approach. Despite being a simple algorithm 
based on A-Star, the overall facial features are obtained 
properly, with only slightly distortions in the mesh 
caused by the approximation of the heuristic. More 
complex and efficient approximations for this problem 
have been proposed, such as the case of Wan et. al. [6]. 
Finally, the results of the hybrid-based approach show a 
very similar synthesis of facial expression compare to the 
previous approach, with proper deformation around the 
mouth and the eyes. In addition to this, the small artifacts 
appearing on the geodesic mesh-based approach are 
fixed by the interpolation with the Euclidean-based 
calculations, which results in smoother surface 
deformations. 

Particularly for the case of the stylized model, the 
deformations of the eyelids do not approximate correctly 
the real footage, given the high density of the mesh 
around the eyes and the difference in proportions 
compared to the performer. 

Overall, our outcomes show that, among the three 
approaches presented in this paper, the hybrid-based 
approach finds the best compromise between 
computational performance and animation fidelity. 

7. Conclusions 

In this paper, we presented a hybrid technique for the 
synthesis of facial expressions in digital characters based 
on motion capture data. We implemented the RBF 
interpolation algorithm with three distance calculation 
variants: Euclidean-based, geodesic mesh-based and 
hybrid-based, the latter taking the advantages of the two 
previous methods. To improve the computational time, 
the calculations were aided by pre-processed distance 
data. For our experiments, we mapped a selection of 
motion capture data based on FACS and performed a 
performance and quality evaluation for each approach. 
The results prove the efficiency of the hybrid technique 
for the synthesis of facial animation and show the 
significance of using distance matrix data for better 
performance. 

Based on this research, there are potential paths for 
future development. The wide range of expressions 
generated in this research can easily lead to the creation 
of a detailed facial rig based on blendshapes, a solution 
that other authors have implemented successfully before 
[2]. It could also be interesting to consider the generation 
of facial models using techniques such as 
photogrammetry. In terms of evaluation, the qualitative 
results could be further explored with additional data and 
perceptual experiments to measure public perception on 
the animation fidelity of our methods. In addition to this, 
scale transfer algorithms could be applied to models that 
distort the human features, such as the case of the 
stylized face, to better approximate the expressions of the 
real footage. Finally, further variants can be applied for 
surface deformation beyond the methods presented here, 
such as more complex geodesic techniques, machine 
learning and partial-differential equations, the latter two 
suggested as new trending areas of research for the 
generation of realistic facial animation from motion 
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capture. Further study in these areas is needed for the 
generation of new solutions towards high-quality realistic 
and believable facial animation. 
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